Okay, so in this example, things were relatively simple, and I rigged them to be simple.
But if we do the general case, then the expected utility of an action given some ed evidence
was the maximum overall A, the utility of a state i times the probability that Si is
actually obtained given the evidence and the action.
That's what we've seen before, except for some reason I've turned this around with respect
to that formula.
So we can treat the outcome of our information gathering as new evidence, right?
And with that, we could choose some specific optimal action such that we maximize here.
So we can treat Ej in principle as a random variable whose value is unknown, currently
unknown, because we're giving it action.
And that really gives me the value of perfect information, which is the information we're
getting if we actually have full information about that variable, if we can observe it
fully.
Which in the oil drilling thing is actually not the case, because in real world, the survey
only tells you likelihood of oil being down there, because we are not observing the oil
directly, but some geologic formations which often have oil.
So we can compute the value of perfect information being the probability by summing over all
possible actions, summing up the weighted sum with the expected utility given that action,
and the new evidence times the probabilities of that action.
Of that state, that evidence actually being obtained given the evidence we already had
before.
So we have a general formula that computes the value of perfect information of a certain
unknown variable.
Testing given the evidence E that we already have, the value of testing the value of Ej.
Big, at least double sum, minus another big sum, but that's something we can compute.
We've seen all these here before.
We know how to compute them.
If we have say a Bayesian network and utilities.
Now we can ask ourselves what are the properties of this value of information function.
So it's always non-negative.
Where you need to understand that it's non-negative only in expectation.
If we in our oil drilling example paid half a million for that survey and that survey
tells us no oil, it wasn't worth it.
But the value is always positive.
I'm sorry, non-negative of course.
The value might be zero, in which case we shouldn't test.
Even if expected, it's something about expectation, the value of information.
After the test it might come out as having lost money, which is okay.
It was still valuable to do the test because it gave you a higher expected utility.
It's non-additive.
So if you test the same thing twice, you're not actually getting the value twice.
The second time you're testing, that actually gives you no new information.
So the value the second time is zero.
But it's order independent, which is a somewhat surprising fact to me.
Yes.
But you have to be careful here.
It's not a naive order independence.
Think about here the distribution of evidence.
If you do this first, then you have to take that into account the second time.
But it turns out that...
Presenters
Zugänglich über
Offener Zugang
Dauer
00:18:20 Min
Aufnahmedatum
2021-03-29
Hochgeladen am
2021-03-30 14:06:31
Sprache
en-US
A general formula to compute the expected utility is given. Then the properties of VPI are discussed and a simple Information-Gathering Agent is explained.